Conversation
mempool/check_tx.go
Outdated
| // TODO: do we even do recheck anymore? | ||
| if request.Type == abci.CheckTxType_Recheck { | ||
| return &abci.ResponseCheckTx{Code: abci.CodeTypeOK}, nil | ||
| } |
There was a problem hiding this comment.
we can expect this to never be called with this mempool, yes? cc @mattac21
|
@greptile whats the scoop man |
|
@greptile |
|
@greptile hello |
Greptile SummaryThis PR introduces async recheck support by routing all Key changes:
Issue found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant C as CometBFT
participant H as CheckTxHandler
participant D as TxDecoder
participant W as InsertWorker
participant MP as ExperimentalEVMMempool
C->>H: RequestCheckTx (New or Recheck)
Note over H: ⚠️ Type not checked — Recheck<br/>takes same path as New
H->>D: TxDecoder(request.Tx)
alt decode error
D-->>H: error
H-->>C: ResponseCheckTx (non-OK)
else decode success
D-->>H: sdk.Tx
H->>H: context.WithTimeout(Background, timeout)
H->>MP: Insert(ctx, tx)
MP->>W: send insertRequest{tx, errC}
W->>MP: process tx (validate, add to pool)
W-->>MP: errC <- result
MP-->>H: err (nil or ErrAlreadyKnown etc.)
alt insert OK
H-->>C: ResponseCheckTx{Code: OK}
else insert error (e.g. ErrAlreadyKnown on recheck)
H-->>C: ResponseCheckTx (non-OK)
Note over C: Interprets non-OK recheck as<br/>invalid tx → evicts from mempool
end
end
Last reviewed commit: 4fe1466 |
mattac21
left a comment
There was a problem hiding this comment.
could you also rebase this off of main? the tests are pretty broken on feat/krakatoa but if you go to main you should be able to get all of the system unit and integration tests passing
mempool/check_tx.go
Outdated
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), timeout) | ||
| defer cancel() | ||
| errC, err := mempool.insert(ctx, tx) |
There was a problem hiding this comment.
the mempool.Insert function already waits on the errC or ctx to be done like you are doing here. could we use that instead of using the private mempool.insert?
| index map[sdk.Tx]int | ||
| keys map[string]int |
There was a problem hiding this comment.
keys is now serving the same purpose as index right? we can remove index now I think
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
|
|
||
| if key, ok := cosmosTxKey(tx); ok { |
There was a problem hiding this comment.
hm thinking through this, im not actually sure this replacement is safe to do without rechecking all txs for this account, when we replace a tx we may want to just remove all txs > replaced nonce from the tx store for this account and then let them just be included in the next block.
I think this isn't safe because we would need to recheck all txs after the replaced one onto of the state of this new tx, which we are not doing. for example if we recheck txs 4 5 and 6 of an an account and include them in the tx store, then someone can replace tx 4 with a completely different tx that may have invalidated 5 and 6, but we are not rechecking those against the new tx 4's context, which may then cause the proposal to be invalid.
I think this is an issue for evm txs in the legacypool too that we need to address.
There was a problem hiding this comment.
working on a separate PR to address this issue in both evm and cosmos tx stores
| select { | ||
| case err := <-errC: | ||
| if err != nil { | ||
| return sdkerrors.ResponseCheckTxWithEvents(err, gInfo.GasWanted, gInfo.GasUsed, anteEvents, false), nil |
There was a problem hiding this comment.
were the anteEvents also always nil here? we are missing out on events now as well with this? should we modify the response of mempool.Insert to return some of this info?
There was a problem hiding this comment.
i checked on v0.53.x and main, calling broadcast tx sync never returned anything other than code, tx hash, and a log if it failed.
you can see comet stripping down the response here: https://github.com/cometbft/cometbft/blob/1bb8b386fc366bc4655dede0535e16d1ad669c7d/rpc/core/mempool.go#L58-L64
There was a problem hiding this comment.
got it, tested on gaia as well earlier and this was the case too
f7e5c47 to
c54ef7b
Compare
|
@greptile re-review |
|
@greptile re-review |
| return func(_ types.RunTx, request *abci.RequestCheckTx) (*abci.ResponseCheckTx, error) { | ||
| tx, err := mempool.txConfig.TxDecoder()(request.Tx) | ||
| if err != nil { | ||
| // detect if there is a nonce gap error (only returned for EVM transactions) | ||
| if errors.Is(err, ErrNonceGap) || errors.Is(err, ErrNonceLow) { | ||
| // send it to the mempool for further triage | ||
| err := mempool.InsertInvalidNonce(request.Tx) | ||
| if err != nil { | ||
| return sdkerrors.ResponseCheckTxWithEvents(err, gInfo.GasWanted, gInfo.GasUsed, anteEvents, false), nil | ||
| } | ||
| } | ||
| // If its already known, this can mean the the tx was promoted from nonce gap to valid | ||
| // and by allowing ErrAlreadyKnown to be silent, we allow re-gossiping of such txs | ||
| // this also covers the case of re-submission of the same tx enforcing overpricing for replacement | ||
| if errors.Is(err, txpool.ErrAlreadyKnown) { | ||
| return sdkerrors.ResponseCheckTxWithEvents(nil, gInfo.GasWanted, gInfo.GasUsed, anteEvents, false), nil | ||
| } | ||
|
|
||
| // anything else, return regular error | ||
| return sdkerrors.ResponseCheckTxWithEvents(err, gInfo.GasWanted, gInfo.GasUsed, anteEvents, false), nil | ||
| return sdkerrors.ResponseCheckTxWithEvents(err, 0, 0, nil, debug), nil | ||
| } | ||
|
|
||
| return &abci.ResponseCheckTx{ | ||
| GasWanted: int64(gInfo.GasWanted), // #nosec G115 -- this is copied from the Cosmos SDK | ||
| GasUsed: int64(gInfo.GasUsed), // #nosec G115 -- this is copied from the Cosmos SDK | ||
| Log: result.Log, | ||
| Data: result.Data, | ||
| Events: types.MarkEventsToIndex(result.Events, nil), | ||
| }, nil | ||
| ctx, cancel := context.WithTimeout(context.Background(), timeout) | ||
| defer cancel() | ||
| if err := mempool.Insert(ctx, tx); err != nil { | ||
| return sdkerrors.ResponseCheckTxWithEvents(err, 0, 0, nil, debug), nil | ||
| } | ||
| return &abci.ResponseCheckTx{Code: abci.CodeTypeOK}, nil | ||
| } |
There was a problem hiding this comment.
Recheck requests not short-circuited — breaks TestRecheckIsNoOp and risks tx eviction
The handler does not inspect request.Type, so abci.CheckTxType_Recheck requests take the same code path as new transactions.
Two concrete failure modes follow:
-
TestRecheckIsNoOpwill fail. The test submits[]byte("not-a-real-tx")withCheckTxType_Recheckand assertsCodeTypeOK. With the current code the decoder will return an error andsdkerrors.ResponseCheckTxWithEventswill produce a non-CodeTypeOKresponse. -
Valid in-mempool transactions get evicted during recheck. When CometBFT fires a recheck for a tx that is already in the pool,
mempool.Insertwill returnErrAlreadyKnown(non-OK). CometBFT interprets a non-OK recheck response as the transaction having become invalid and removes it from its tracking — silently draining the mempool.
Since the app sets OperateExclusively = true, CometBFT delegates full mempool management to the application layer. Recheck is therefore a no-op from CometBFT's perspective and should always return CodeTypeOK without hitting the insert worker:
return func(_ types.RunTx, request *abci.RequestCheckTx) (*abci.ResponseCheckTx, error) {
if request.Type == abci.CheckTxType_Recheck {
return &abci.ResponseCheckTx{Code: abci.CodeTypeOK}, nil
}
tx, err := mempool.txConfig.TxDecoder()(request.Tx)
...
}
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1062 +/- ##
==========================================
+ Coverage 64.11% 65.27% +1.16%
==========================================
Files 331 331
Lines 23303 23302 -1
==========================================
+ Hits 14941 15211 +270
+ Misses 6946 6920 -26
+ Partials 1416 1171 -245
🚀 New features to boost your workflow:
|
Description
we recently updated comet to no longer lock on recheck, pushing concurrency responsibility to the application.
changes:
Closes: STACK-2455
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
mainbranch